Search CORE

172 research outputs found

Boolean Matrix Factorization Meets Consecutive Ones Property

Author: Miettinen P.
Tatti N.
Publication venue
Publication date: 01/01/2019
Field of study

Boolean matrix factorization is a natural and a popular technique for summarizing binary matrices. In this paper, we study a problem of Boolean matrix factorization where we additionally require that the factor matrices have consecutive ones property (OBMF). A major application of this optimization problem comes from graph visualization: standard techniques for visualizing graphs are circular or linear layout, where nodes are ordered in circle or on a line. A common problem with visualizing graphs is clutter due to too many edges. The standard approach to deal with this is to bundle edges together and represent them as ribbon. We also show that we can use OBMF for edge bundling combined with circular or linear layout techniques. We demonstrate that not only this problem is NP-hard but we cannot have a polynomial-time algorithm that yields a multiplicative approximation guarantee (unless P = NP). On the positive side, we develop a greedy algorithm where at each step we look for the best 1-rank factorization. Since even obtaining 1-rank factorization is NP-hard, we propose an iterative algorithm where we fix one side and and find the other, reverse the roles, and repeat. We show that this step can be done in linear time using pq-trees. We also extend the problem to cyclic ones property and symmetric factorizations. Our experiments show that our algorithms find high-quality factorizations and scale well

MPG.PuRe

Density-friendly Graph Decomposition

Author: Tatti N.
Tsourakakis C. E.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 21/11/2015
Field of study

Decomposing a graph into a hierarchical structure via k-core analysis is a standard operation in any modern graph-mining toolkit. k-core decomposition is a simple and efficient method that allows to analyze a graph beyond its mere de-gree distribution. More specifically, it is used to identify areas in the graph of increasing centrality and connected-ness, and it allows to reveal the structural organization of the graph. Despite the fact that k-core analysis relies on vertex de-grees, k-cores do not satisfy a certain, rather natural, density property. Simply put, the most central k-core is not nec-essarily the densest subgraph. This inconsistency between k-cores and graph density provides the basis of our study. We start by defining what it means for a subgraph to be locally-dense, and we show that our definition entails a nested chain decomposition of the graph, similar to the one given by k-cores, but in this case the components are ar-ranged in order of increasing density. We show that such a locally-dense decomposition for a graph G = (V,E) can be computed in polynomial time. The running time of the exact decomposition algorithm is O(|V |2|E|) but is signifi-cantly faster in practice. In addition, we develop a linear-time algorithm that provides a factor-2 approximation to the optimal locally-dense decomposition. Furthermore, we show that the k-core decomposition is also a factor-2 ap-proximation, however, as demonstrated by our experimental evaluation, in practice k-cores have different structure than locally-dense subgraphs, and as predicted by the theory, k-cores are not always well-aligned with graph density

CiteSeerX

Crossref

Interactive and Iterative Discovery of Entity Network Subgraphs

Author: North C.
Ramakrishnan N.
Sun M.
Tatti N.
Vreeken J.
Wu H.
Publication venue
Publication date: 01/01/2016
Field of study

Graph mining to extract interesting components has been studied in various guises, e.g., communities, dense subgraphs, cliques. However, most existing works are based on notions of frequency and connectivity and do not capture subjective interestingness from a user's viewpoint. Furthermore, existing approaches to mine graphs are not interactive and cannot incorporate user feedbacks in any natural manner. In this paper, we address these gaps by proposing a graph maximum entropy model to discover surprising connected subgraph patterns from entity graphs. This model is embedded in an interactive visualization framework to enable human-in-the-loop, model-guided data exploration. Using case studies on real datasets, we demonstrate how interactions between users and the maximum entropy model lead to faster and explainable conclusions

MPG.PuRe

Generating Realistic Synthetic Population Datasets

Author: Chakraborty P.
Ning Y.
Ramakrishnan N.
Tatti N.
Vreeken J.
Wu H.
Publication venue
Publication date: 01/01/2016
Field of study

Modern studies of societal phenomena rely on the availability of large datasets capturing attributes and activities of synthetic, city-level, populations. For instance, in epidemiology, synthetic population datasets are necessary to study disease propagation and intervention measures before implementation. In social science, synthetic population datasets are needed to understand how policy decisions might affect preferences and behaviors of individuals. In public health, synthetic population datasets are necessary to capture diagnostic and procedural characteristics of patient records without violating confidentialities of individuals. To generate such datasets over a large set of categorical variables, we propose the use of the maximum entropy principle to formalize a generative model such that in a statistically well-founded way we can optimally utilize given prior information about the data, and are unbiased otherwise. An efficient inference algorithm is designed to estimate the maximum entropy model, and we demonstrate how our approach is adept at estimating underlying data distributions. We evaluate this approach against both simulated data and on US census datasets, and demonstrate its feasibility using an epidemic simulation application

MPG.PuRe

Fast Likelihood-Based Change Point Detection

Author: A Dries
GJ Ross
J de Leeuw
M Basseville
N Tatti
N Tatti
R Bellman
S Aminikhanghahi
S Guha
S Guha
T Calders
Y Kawahara
Publication venue: Springer International Publishing AG
Publication date: 01/01/2020
Field of study

Change point detection plays a fundamental role in many real-world applications, where the goal is to analyze and monitor the behaviour of a data stream. In this paper, we study change detection in binary streams. To this end, we use a likelihood ratio between two models as a measure for indicating change. The first model is a single bernoulli variable while the second model divides the stored data in two segments, and models each segment with its own bernoulli variable. Finding the optimal split can be done in O(n) time, where n is the number of entries since the last change point. This is too expensive for large n. To combat this we propose an approximation scheme that yields (1 - epsilon) approximation in O(epsilon(-1) log(2) n) time. The speed-up consists of several steps: First we reduce the number of possible candidates by adopting a known result from segmentation problems. We then show that for fixed bernoulli parameters we can find the optimal change point in logarithmic time. Finally, we show how to construct a candidate list of size O(epsilon(-1) log n) formodel parameters. We demonstrate empirically the approximation quality and the running time of our algorithm, showing that we can gain a significant speed-up with a minimal average loss in optimality.Peer reviewe

Crossref

Helsingin yliopiston digitaalinen arkisto

Lymphatic endothelium stimulates melanoma metastasis and invasion via MMP14-dependent Notch3 and b1-integrin activation

Author: Alve S
Balistreri G
Gramolelli S
Hautaniemi S
Icay K
Ivaska J
Lehti K
Niiranen O
Ojala PM
Paatero I
Pekkonen P
Perälä N
Repo P
Saharinen P
Taiwo A
Tatti-Bugaeva O
Tuohinto K
Zinovkina N
Publication venue: 'eLife Sciences Publications, Ltd'
Publication date: 24/04/2018
Field of study

Lymphatic invasion and lymph node metastasis correlate with poor clinical outcome in melanoma. However, the mechanisms of lymphatic dissemination in distant metastasis remain incompletely understood. We show here that exposure of expansively growing human WM852 melanoma cells, but not singly invasive Bowes cells, to lymphatic endothelial cells (LEC) in 3D co-culture facilitates melanoma distant organ metastasis in mice. To dissect the underlying molecular mechanisms, we established LEC co-cultures with different melanoma cells originating from primary tumors or metastases. Notably, the expansively growing metastatic melanoma cells adopted an invasively sprouting phenotype in 3D matrix that was dependent on MMP14, Notch3 and β1-integrin. Unexpectedly, MMP14 was necessary for LEC-induced Notch3 induction and coincident β1-integrin activation. Moreover, MMP14 and Notch3 were required for LEC-mediated metastasis of zebrafish xenografts. This study uncovers a unique mechanism whereby LEC contact promotes melanoma metastasis by inducing a reversible switch from 3D growth to invasively sprouting cell phenotype

Spiral - Imperial College Digital Repository

Genome-scaled phylogeny of Saccharomyces cerevisiae from spontaneous must fermentations

Author: Carlino N.
Guzzon R.
Longa C. M.
Pasolli E.
Pedrazzoli F.
Rota Stabelli O.
Segata N.
Silverj A.
Tatti A.
Publication venue: country:IT
Publication date: 01/01/2023
Field of study

Modern winemakers commonly inoculate selected S. cerevisiae strains in must to obtain controlled fermentations and reproducible products. However, wine has been produced for thousands of years using spontaneous fermentations from wild strains, a practice that is experiencing a revival among small wine producers. Despite the widespread usage of such strains in the past, there is much to know about their ecology, evolution and functional potential. For example, the reciprocal affinities of these strains within the S. cerevisiae phylogeny have yet to be discovered, as well as the degree of their biodiversity and their impact on wine terroir. To fill this knowledge gap, we aim at characterising at strain level the S. cerevisiae present in spontaneously fermented musts sampled across Italy. We set up a protocol based on polyphenols-removing prewashes, followed by whole-genome shotgun sequencing at a depth of 5Gb of DNA per sample. We performed both an assembly-free analysis to reconstruct the strain-level phylogeny of S. cerevisiae strains using the species-specific-marker based StrainPhlAn, and the reconstruction of Metagenome-Assembled Genomes of these strains for downstream functional analyses. To plan conservation acts in a scenario of continuous climate change, we aim at isolating and maintaining strains of interest. We will present preliminary results from the analysis of spontaneous musts sampled at different fermenting stages

Archivio istituzionale della ricerca - Fondazione Edmund Mach

On Coupling FCA and MDL in Pattern Mining

Author: A Gallo
B Ganter
CC Aggarwal
J Han
J Vreeken
M Mampaey
N Tatti
PD Grünwald
S Brin
SO Kuznetsov
Publication venue: 'Springer Fachmedien Wiesbaden GmbH'
Publication date: 25/06/2019
Field of study

International audiencePattern Mining is a well-studied field in Data Mining and Machine Learning. The modern methods are based on dynamically updating models, among which MDL-based ones ensure high-quality pattern sets. Formal concepts also characterize patterns in a condensed form. In this paper we study MDL-based algorithm called Krimp in FCA settings and propose a modified version that benefits from FCA and relies on probabilistic assumptions that underlie MDL. We provide an experimental proof that the proposed approach improves quality of pattern sets generated by Krimp

Crossref

INRIA a CCSD electronic archive server

Choroidal vascularity map in unilateral central serous chorioretinopathy: A comparison with fellow and healthy eyes

Author: Beale O.
Chandra K.
Chhablani J.
Ibrahim M. N.
Iovino C.
Jabeen A.
Lanza M.
Nkrumah G.
Peiretti E.
Rasheed M. A.
Sahoo N. K.
Singh S. R.
Tatti F.
Vupparaboina K. K.
Publication venue: 'MDPI AG'
Publication date: 01/01/2021
Field of study

Background: To map the choroidal vascularity index and compare two eyes in patients with unilateral central serous chorioretinopathy (CSCR). Methods: This was a retrospective, observa-tional study performed in patients with unilateral CSCR. Choroidal thickness (CT) and Choroidal vascularity index (CVI) were measured and mapped in various zones according to the early treatment diabetic retinopathy (ETDRS) grid. Results: A total of 20 CSCR patients (20 study and 20 fellow eyes) were included in the study. Outer nasal region CT was seen to be significantly lower than central CT (p = 0.042) and inner nasal CT (p = 0.007); outer ring CT was significantly less than central (p = 0.04) and inner ring (p = 0.01) CT in CSCR eyes. On potting all the CVI values against the corresponding CT values, a positive correlation was seen in CSCR eyes (r = 0.54, p < 0.01), which was slightly weaker in fellow eyes (r = 0.3, p < 0.01) and a negative correlation was seen in healthy eyes (r = −0.262, p < 0.01). Conclusions: Correlation between CVI and CT was altered in CSCR eyes as compared to fellow and normal eyes with increasing CVI towards the center of the macula and superiorly in CSCR eyes

Archivio istituzionale della ricerca - Università di Cagliari

Archivio Istituzionale della Ricerca - Università degli Studi della Campania "Luigi Vanvitelli"

Fast Generation of Best Interval Patterns for Nonmonotonic Constraints

Author: A Buzmakov
B Ganter
B Ganter
C Roth
F Moerchen
GI Webb
GI Webb
H Yao
J Cao
J Vreeken
N Pasquier
N Tatti
SO Kuznetsov
SO Kuznetsov
SO Kuznetsov
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 16/06/2015
Field of study

International audienceIn pattern mining, the main challenge is the exponential explosion of the set of patterns. Typically, to solve this problem, a constraint for pattern selection is introduced. One of the first constraints proposed in pattern mining is support (frequency) of a pattern in a dataset. Frequency is an anti-monotonic function, i.e., given an infrequent pattern, all its superpatterns are not frequent. However, many other constraints for pattern selection are neither monotonic nor anti-monotonic, which makes it difficult to generate patterns satisfying these constraints.In this paper we introduce the notion of "generalized monotonicity" and Sofia algorithm that allow generating best patterns in polynomial time for some nonmonotonic constraints modulo constraint computation and pattern extension operations. In particular, this algorithm is polynomial for data on itemsets and interval tuples. In this paper we consider stability and delta-measure which are nonmonotonic constraints and apply them to interval tuple datasets. In the experiments, we compute best interval tuple patterns w.r.t. these measures and show the advantage of our approach over postfiltering approaches

arXiv.org e-Print Archive

Crossref

INRIA a CCSD electronic archive server